A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts

نویسندگان

  • Ruggero Pintus
  • Ying Yang
  • Enrico Gobbetti
  • Holly E. Rushmeier
چکیده

Historical and artistic handwritten books are valuable cultural heritage (CH) items, as they provide information about tangible and intangible cultural aspects from the past. Massive digitization projects have made these kind of data available to a world-wide population, and pose real challenges for automatic processing. In this scenario, document layout analysis plays a significant role, being a fundamental step of any document image understanding system. In this paper, we present a completely automatic algorithm to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and we show how to exploit this outcome to find two layout elements, i.e., text blocks and text lines. Our proposed technique have been evaluated on a large and heterogeneous corpus content, and our experimental results demonstrate that this approach is efficient and reliable, even when applied to very noisy and damaged books.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Margins are more important than text, Historical values of margins, memorial notes and colophons of Manuscripts in Zoroastrian tradition

In the Zoroastrian tradition, the most important challenge and the most ambiguous issue is ambiguity in history and neglect of time and chronology. Perhaps, this approach that historical time is limit and the begging and end of time is clear and the goodness will be conqueror eventually; it is because of ambiguity of history in Zoroastrian tradition.since early time to now, the Zoroastrian re...

متن کامل

Radial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting

Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exi...

متن کامل

Image Segmentation of Historical Handwriting from Palm Leaf Manuscripts

Palm leaf manuscripts were one of the earliest forms of written media and were used in Southeast Asia to store early written knowledge about subjects such as medicine, Buddhist doctrine and astrology. Therefore, historical handwritten palm leaf manuscripts are important for people who like to learn about historical documents, because we can learn more experience from them. This paper presents a...

متن کامل

Robust Line Detection in Historical Church Registers

For being able to automatically acquire information recorded in church registers and other historical scriptures, the text of such documents needs to be segmented prior to automatic reading. Segmentation of old handwritten scriptures is difficult for two main reasons. Lines of text in general are not straight and ascenders and descenders of adjacent lines interfere. The algorithms described in ...

متن کامل

Line Detection and Segmentation in Historical Church Registers

For being able to automatically acquire the information recorded in church registers and other historical scriptures, the writing on these documents has to be recognized. This paper describes algorithms for transforming the paper documents into a representation of text apt to be used as input for an automatic text recognizer. The automatic recognition of old handwritten scriptures is difficult ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014